The publication rates are skyrocketing across many fields of science, and it is difficult to stay up to date with the latest research. This makes automatically summarizing the latest findings and helping scholars to synthesize related work in a given area an attractive research objective. In this paper we study the problem of citation text generation, where given a set of cited papers and citing context the model should generate a citation text. While citation text generation has been tackled in prior work, existing studies use different datasets and task definitions, which makes it hard to study citation text generation systematically. To address this, we propose CiteBench: a benchmark for citation text generation that unifies the previous datasets and enables standardized evaluation of citation text generation models across task settings and domains. Using the new benchmark, we investigate the performance of multiple strong baselines, test their transferability between the datasets, and deliver new insights into task definition and evaluation to guide the future research in citation text generation. We make CiteBench publicly available at https://github.com/UKPLab/citebench.
translated by 谷歌翻译
Natural language processing researchers develop models of grammar, meaning and human communication based on written text. Due to task and data differences, what is considered text can vary substantially across studies. A conceptual framework for systematically capturing these differences is lacking. We argue that clarity on the notion of text is crucial for reproducible and generalizable NLP. Towards that goal, we propose common terminology to discuss the production and transformation of textual data, and introduce a two-tier taxonomy of linguistic and non-linguistic elements that are available in textual sources and can be used in NLP modeling. We apply this taxonomy to survey existing work that extends the notion of text beyond the conservative language-centered view. We outline key desiderata and challenges of the emerging inclusive approach to text in NLP, and suggest systematic community-level reporting as a crucial next step to consolidate the discussion.
translated by 谷歌翻译
Direct speech-to-speech translation (S2ST), in which all components can be optimized jointly, is advantageous over cascaded approaches to achieve fast inference with a simplified pipeline. We present a novel two-pass direct S2ST architecture, {\textit UnitY}, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization. To leverage large amounts of unlabeled text data, we pre-train the first-pass text decoder based on the self-supervised denoising auto-encoding task. Experimental evaluations on benchmark datasets at various data scales demonstrate that UnitY outperforms a single-pass speech-to-unit translation model by 2.5-4.2 ASR-BLEU with 2.83x decoding speed-up. We show that the proposed methods boost the performance even when predicting spectrogram in the second pass. However, predicting discrete units achieves 2.51x decoding speed-up compared to that case.
translated by 谷歌翻译
We apply topological data analysis (TDA) to speech classification problems and to the introspection of a pretrained speech model, HuBERT. To this end, we introduce a number of topological and algebraic features derived from Transformer attention maps and embeddings. We show that a simple linear classifier built on top of such features outperforms a fine-tuned classification head. In particular, we achieve an improvement of about $9\%$ accuracy and $5\%$ ERR on four common datasets; on CREMA-D, the proposed feature set reaches a new state of the art performance with accuracy $80.155$. We also show that topological features are able to reveal functional roles of speech Transformer heads; e.g., we find the heads capable to distinguish between pairs of sample sources (natural/synthetic) or voices without any downstream fine-tuning. Our results demonstrate that TDA is a promising new approach for speech analysis, especially for tasks that require structural prediction.
translated by 谷歌翻译
The access to activity of subcortical structures offers unique opportunity for building intention dependent brain-computer interfaces, renders abundant options for exploring a broad range of cognitive phenomena in the realm of affective neuroscience including complex decision making processes and the eternal free-will dilemma and facilitates diagnostics of a range of neurological deceases. So far this was possible only using bulky, expensive and immobile fMRI equipment. Here we present an interpretable domain grounded solution to recover the activity of several subcortical regions from the multichannel EEG data and demonstrate up to 60% correlation between the actual subcortical blood oxygenation level dependent sBOLD signal and its EEG-derived twin. Then, using the novel and theoretically justified weight interpretation methodology we recover individual spatial and time-frequency patterns of scalp EEG predictive of the hemodynamic signal in the subcortical nuclei. The described results not only pave the road towards wearable subcortical activity scanners but also showcase an automatic knowledge discovery process facilitated by deep learning technology in combination with an interpretable domain constrained architecture and the appropriate downstream task.
translated by 谷歌翻译
The Transformer is an extremely powerful and prominent deep learning architecture. In this work, we challenge the commonly held belief in deep learning that going deeper is better, and show an alternative design approach that is building wider attention Transformers. We demonstrate that wide single layer Transformer models can compete with or outperform deeper ones in a variety of Natural Language Processing (NLP) tasks when both are trained from scratch. The impact of changing the model aspect ratio on Transformers is then studied systematically. This ratio balances the number of layers and the number of attention heads per layer while keeping the total number of attention heads and all other hyperparameters constant. On average, across 4 NLP tasks and 10 attention types, single layer wide models perform 0.3% better than their deep counterparts. We show an in-depth evaluation and demonstrate how wide models require a far smaller memory footprint and can run faster on commodity hardware, in addition, these wider models are also more interpretable. For example, a single layer Transformer on the IMDb byte level text classification has 3.1x faster inference latency on a CPU than its equally accurate deeper counterpart, and is half the size. We therefore put forward wider and shallower models as a viable and desirable alternative for small models on NLP tasks, and as an important area of research for domains beyond this.
translated by 谷歌翻译
从敏感数据中学习时,必须注意确保培训算法解决隐私问题。教师合奏或PATE的规范私人聚合通过通过投票机制汇总(可能分布的)教师模型集合的预测来计算输出标签。该机制增加了噪音,以获得有关教师培训数据的差异隐私保证。在这项工作中,我们观察到这种噪声的使用(使PATE预测随机)可以实现敏感信息的新形式。对于给定的输入,我们的对手利用这种随机性来提取基础教师提交的投票的高保真直方图。从这些直方图中,对手可以学习输入的敏感属性,例如种族,性别或年龄。尽管这次攻击并没有直接违反差异隐私保证,但它显然违反了隐私规范和期望,如果没有插入差异隐私的噪音,就根本不可能。实际上,违反直觉,随着我们添加更多噪音以提供更强的差异隐私,攻击变得更加容易。我们希望这鼓励未来的工作从整体上考虑隐私,而不是将差异隐私视为灵丹妙药。
translated by 谷歌翻译
神经网络容易受到对抗性示例的影响,可导致模型失败。对抗训练是阻止对抗例子的解决方案之一。模型在训练过程中受到攻击,并学会对其进行弹性。然而,这样的过程目前很昂贵 - 它需要很长时间才能用对抗样本生产和训练模型,而且更糟糕的是,偶尔会失败。在本文中,我们证明了通过数据子采样来提高对抗性训练效率的数据修剪方法。我们从经验上表明,数据修剪会改善对抗性训练的收敛性和可靠性,尽管具有不同水平的公用事业降级。例如,我们观察到,使用CIFAR10的随机子采样删除40%的数据,我们对最强大的攻击者失去了8%的对抗精度,而仅使用20%的数据,我们就会损失14%的对抗精度,并减少运行时的运行时间。第3个因素。有趣的是,我们发现在某些环境中,数据修剪带来了两个世界的好处 - 既可以提高对抗的准确性和训练时间。
translated by 谷歌翻译
深层确定性的非政策算法的类别有效地用于解决具有挑战性的连续控制问题。但是,当前的方法使用随机噪声作为一种常见的探索方法,该方法具有多个弱点,例如需要对给定任务进行手动调整以及在训练过程中没有探索性校准。我们通过提出一种新颖的指导探索方法来应对这些挑战,该方法使用差异方向控制器来结合可扩展的探索性动作校正。提供探索性方向的蒙特卡洛评论家合奏作为控制器。提出的方法通过动态改变勘探来改善传统探索方案。然后,我们提出了一种新颖的算法,利用拟议的定向控制器进行政策和评论家修改。所提出的算法在DMControl Suite的各种问题上都优于现代增强算法的现代增强算法。
translated by 谷歌翻译
防御对抗例子仍然是一个空旷的问题。一个普遍的信念是,推理的随机性增加了寻找对抗性输入的成本。这种辩护的一个例子是将随机转换应用于输入之前,然后将其馈送到模型。在本文中,我们从经验和理论上研究了这种随机预处理的防御措施,并证明它们存在缺陷。首先,我们表明大多数随机防御措施比以前想象的要弱。他们缺乏足够的随机性来承受诸如投影梯度下降之类的标准攻击。这对长期以来的假设产生了怀疑,即随机防御能力无效,旨在逃避确定性的防御和迫使攻击者以整合对转型(EOT)概念的期望。其次,我们表明随机防御与对抗性鲁棒性和模型不变性之间的权衡面临。随着辩护模型获得更多的随机化不变性,它们变得不太有效。未来的工作将需要使这两种效果分解。我们的代码在补充材料中可用。
translated by 谷歌翻译